On Random Additive Perturbation for Privacy Preserving Data Mining
نویسندگان
چکیده
Title of Thesis: On Random Additive Perturbation for Privacy Preserving Data Mining Author: Souptik Datta, Master of Science, 2004 Thesis directed by: Dr. Hillol Kargupta, Associate Professor Department of Computer Science and Electrical Engineering Privacy is becoming an increasingly important issue in many data mining applications. This has triggered the development of many privacy-preserving data mining techniques. A large fraction of them uses randomized data distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values often using additive noise. This work presents an analysis of privacy preserving ability of the random value distortion technique in data mining. It points out the possible breach of privacy using random additive perturbation. This work shows that random matrices have ‘predictable’ structures in the spectral domain and it develops a random matrix-based ‘Spectral Filtering Technique’ (SPF) to retrieve original data from the dataset distorted by adding random values. The proposed method works by comparing the spectrum generated from the observed data with that of random matrices. This work presents the theoretical foundation and extensive experimental results to demonstrate that in many cases random data distortion preserves very little data privacy. It presents some direct comparison with previously suggested privacy preserving data mining techniques based on additive random perturbation as well to show the serious breach of privacy. It also explores the possibility of proposed spectral filtering technique on different data types and perturbation methods e.g. discrete data and exclusive or noise. The analytical framework presented in this work points out several possible avenues for the development of new privacy-preserving data mining techniques.
منابع مشابه
Additive Gaussian Noise Based Data Perturbation in Multi-level Trust Privacy Preserving Data Mining
Data perturbation is one of the most popular models used in privacy preserving data mining. It is specially convenient for applications where the data owners need to export/publish the privacy-sensitive data. This work proposes that an Additive Perturbation based Privacy Preserving Data Mining (PPDM) to deal with the problem of increasing accurate models about all data without knowing exact det...
متن کاملPrivacy Preserving Data Mining Using Additive Perturbation on Relational Streaming Data
Data mining concerns with extracting the required important data from the database and ignoring the rest. With the success of data mining, privacy preservation has also acquired the great importance. The new concept privacy preserving data mining PPDM, concerns with preserving the privacy of sensitive individuals data. In this paper, privacy of sensitive attribute data concerned with individual...
متن کاملOn the Privacy Preserving Properties of Random Data Perturbation Techniques
Privacy is becoming an increasingly important issue in many data mining applications. This has triggered the development of many privacy-preserving data mining techniques. A large fraction of them use randomized data distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values ofte...
متن کاملApproval Sheet
Title of Dissertation: Multiplicative Data Perturbation for Privacy Preserving Data Mining Kun Liu, Doctor of Philosophy, 2007 Dissertation directed by: Dr. Hillol Kargupta Associate Professor Department of Computer Science and Electrical Engineering Recent interest in the collection and monitoring of data using data mining technology for the purpose of security and business-related application...
متن کاملRandom Matrices and Applications to Data Filtering
by Qi Wang, M.S. Washington State University December 2003 Chair: Krishnamoorthy Sivakumar Preserving privacy is becoming an important issue in data mining. Random perturbation is a widely used technique to protect privacy of sensitive data values. This technique hides the true data records by modifying the data values using additive random noise, but can still estimate the data distribution fr...
متن کامل